The Distribution and Deposition Algorithm for Multiple Sequences Sets

نویسندگان

  • Kang Ning
  • Hon Wai Leong
چکیده

“Sequences set” is a mathematical model used in many applications such as scheduling, text process and biological sequences analysis. As the number of the sequences becomes larger, “single” sequence set model is not appropriate for the rapidly increasing problem sizes. For example, more and more text processing applications separate a single big text file into multiple files before processing. For these applications, the underline mathematical model is “multiple sequences sets” (MSS). Though there is increasing use of MSS, there is little research on how to process MSS efficiently. To process multiple sequences sets, sequences are first distributed to different sets, sequences for each set are then processed. Deriving effective algorithm for MSS processing is both interesting and challenging. In this paper, we tried to formulated the problem of Process of Multiple Sequences Sets (PMSS) by first defined the cost functions and performance ratio. Based on these, the PMSS problem is formulated as to minimize the total cost of process. We have proposed two greedy algorithms for the PMSS problem, which are based on generalization of algorithms for single sequences set. Then based on the analysis of the features of multiple sequences sets, we have proposed the Distribution and Deposition (DDA) algorithm and DDA* algorithm for PMSS problem. In DDA algorithm, the sequences are first distributed to multiple sets according to their alphabet contents; then sequences in each set are processed by deposition algorithm. The DDA* algorithm differs from the DDA algorithm in that the DDA* algorithm distributes sequences by clustering based on a set of sequence features (alphabet content is one of the properties). Experiments show that DDA and DDA* always output results with smaller costs than other algorithms, and DDA* outperforms DDA in most instances. This indicates that distribution of sequences to multiple sets according to sequence features before processing sequences on each set is beneficial. The DDA and DDA* algorithms are also efficient both in time and space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

Strong convergence theorem for a class of multiple-sets split variational inequality problems in Hilbert spaces

In this paper, we introduce a new iterative algorithm for approximating a common solution of certain class of multiple-sets split variational inequality problems. The sequence of the proposed iterative algorithm is proved to converge strongly in Hilbert spaces. As application, we obtain some strong convergence results for some classes of multiple-sets split convex minimization problems.

متن کامل

A New Algorithm for the Discrete Shortest Path Problem in a Network Based on Ideal Fuzzy Sets

A shortest path problem is a practical issue in networks for real-world situations. This paper addresses the fuzzy shortest path (FSP) problem to obtain the best fuzzy path among fuzzy paths sets. For this purpose, a new efficient algorithm is introduced based on a new definition of ideal fuzzy sets (IFSs) in order to determine the fuzzy shortest path. Moreover, this algorithm is developed for ...

متن کامل

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

THE ENTROPIES OF THE SEQUENCES OF FUZZY SETS AND THE APPLICATIONS OF ENTROPY TO CARDIOGRAPHY

In this paper, rstly we have introduced to entropy of sequences of fuzzy sets and given sometheorems about it. Secondly, the waves P and T which appears in electrocardiograms weretransferred to fuzzy sets, by using denition of entropy for sequences of fuzzy sets, and somenumerical values were obtained for sequences of waves P and T. Thus any person can makea medical predictions for some cardiac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0904.1242  شماره 

صفحات  -

تاریخ انتشار 2009